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A.  NEW  APPROACH  TC  STRUCTURE  PRESERVING  FEATURE  EXTRACTION  * 


Scott  A.  Starks 

Dept,  of  Electrical  Engineering 
Rice  University 
Houston,  Texas  77001 


Abstract 

This  paper  presents  an  approach  to  nonli- 
near feature  extraction  based  on  certain  graph 
theoretical  considerations  (such  as  the  minimal 
spanning  tree,  maximally  complete  subgraphs,  in- 
consistent edges  and  diameter  edges)  and  topolo- 
gical considerations  (such  as  interpoint  di- 
stance measures).  After  appropriate  introducto- 
ry sections,  the  feature  extraction  algorithm 
is  developed  in  section  4.  The  algorithm  is  hie- 
rarchical in  nature  and  offers  considerable  sav- 
ings in  terms  of  computer  computation  and  sto- 
rage requirements.  An  outline  of  the  computer 
procedure  is  also  included. 

1.  Introduction 

In  the  design  of  any  practical  pattern  reco- 
gnition system,  the  problem  of  developing  an  ef- 
ficient feature  extractor  is  critical.  A number 
of  methods  for  optimal  feature  extraction  exist. 
These  are  based  on  a variety  of  optimality  cri- 
teria. A good  review  of  the  methods  of  feature 
extraction  can  be  found  in  [1]. 

Quite  often  in  a real  world  situation  one 
is  presented  with  the  problem  that  the  dimensio- 
nality of  the  measurement  space  is  large,  while 
the  number  of  training  samples  is  small.  In  this 
case,  obtaining  good  estimates  of  the  class  con- 
ditional statistics  is  virtually  impossible.  As 
a result,  all  feature  extraction  algorithms  which 
base  their  optimality  criteria  upon  these  statis- 
tical quantities  are  severely  hindered.  What  is 
needed  is  a feature  extraction  method  which  does 
not  rest  solely  on  statistical  considerations  but 
which  is  based  also  on  structural  attributes  pre- 
sent in  the  training  data. 

In  order  to  accomplish  the  above,  this  paper 
presents  a new  method  for  nonlinear  feature  ex- 
traction. We  begin  this  work  with  the  general 
problem  statement.  Let  there  be  given  a set  of 
data  vectors  X = (Xj^,  X^,...,  Xjj)  where  each 

X.cR".  We  wish  to  find  a corresponding  set  of 
vectors  Y = {Yj^,  Y^,...,  Y^^)  where  each 

Y-eR™  (where  m is  an  integer  satisfying  1<  m<  n), 
in  such  a way  that  the  structure  present  in  X is 
optimally  preserved  under  the  transformation 
from  X to  Y,  that  is,  we  wish  that  the  mapping 
of  X to  Y takes  place  with  the  least  degradation 
in  structure.  Of  course,  the  central  question 
arises  as  to  what  constitutes  structure  in  a da- 
ta set.  Thus,  the  present  paper  thrusts  much  of 
its  attention  toward  this  question. 
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2.  Background 

In  [2],  J.  W.  Sammon  developed  a point-to- 
point  mapping  from  a given  space  to  one  of  lower 
dimensionality.  In  his  algorithm,  Sammon,  by  an 
Iterative  technique,  optimized  a criterion  fun- 
ctional based  upon  interpoint  distances  in  each 
space.  If  d^.  represents  the  Euclidean  distance 
between  points  X^  and  X,  in  r"  (this  distance^ 
will  also  appear  at  tim^s  as  d(X^,  X,))and  d^^^ 

is  the  corresponding  Euclidean  distance  between. 
Yj^  and  Yj  in  R.’",  then  the  mapping  criteria  is 

defined  as;  /j  j*  \2 

- d ) 


-4  z 


where 

d = Z d . 

i<j 

Generally  speaking,  this  indicates  that  in 
optimizing  this  criterion,  one  finds  the  confi- 
guration of  points  Y in  R whose  interpoint  di- 
stance best  match  their  conterparts  in  R . 
Further  examination  of  this  criterion  functional 
yields  that  it  is  a function  of  (Nxm)  variables 
y^, : i = 1, . . .N;  j=  1, . . .m. 

^ In  certain  cases,  the  product  (Nxm)  may  be- 
come so  large  that  optimizing  such  a criterion 
functional  would  be  prohibitive  in  terms  of  com- 
putational complexity.  To  combat  this  problem, 
a heuristic  relaxation  method  was  dev'ised  by 
Chang  and  Lee  [3]  to  perform  point-to-point  fea- 
ture extraction.  Despite  the  fact  that  this  al- 
gorithm is  computationally  less  demanding  than 
the  one  developed  by  Sammon,  it  still  is  based 
upon  the  same  criterion.  As  a result,  it  too 
treats  interpoint  distance  as  the  only  deposlto-* 
ry  of  information  present  in  a data  set. 

Calvert  [4]  and  White  [5]  have  also  contri- 
buted to  the  study  of  point-to-point  mappings  by 
applying  orthogonal  projection  theory  to  the 
problem  and  investigating  the  effect  of  using  L^^ 
or  city-block  distances  instead  of  Euclidean 
distances,  respectively. 

3.  The  Search  for  Structure 

When  one  is  presented  with  the  problem  of 
designing  a practical  pattern  recognition  sys- 
tem under  the  constraint  of  a small  number  of 
training  samples  from  a space  of  high  dimcnsio- 
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nallty,  he  often  resorts  to  cluster  analysis  to 
learn  something  about  the  structure  of  the  data. 
Much  work  has  been  done  In  this  area  and  due  to 
the  fact  that  many  clustering  techniques  might 
be  best  described  as  "ad  hoc"  or  heuristic, 
there  exists  a number  of  clustering  procedures. 
For  good  reviews  of  clustering  techniques  see 
[6],  [7),  and  [8]. 

One  large  heading  for  clustering  procedures 
Is  hierarchical  clustering.  To  begin,  It  Is  ne- 
cessary to  pose  the  problem  of  hierarchical  clu- 
stering. Let  us  consider  the  problem  of  cluste- 
ring a set  of  M objects  Z = [Z,,...  Z^]  into  C 
clusters:  W^,,  W2,...  W^.  We  might  think  of  clus- 
tering as  no  more  than  the  process  of  partitio- 
ning M objects  into  C groups  Wj^,...  where 

U W = Z and  W n W = ^ If  1 / j . 

1=1  ^ 

The  process  of  hierarchical  clustering  is 
begun  by  partitioning  the  M objects  of  Z into  M 
singleton  clusters  (clusters  with  only  one  mem- 
ber). At  the  next  level  of  the  procedure,  a new 
partition  is  obtained  consisting  of  M-1  clusters. 
At  each  successive  level,  the  number  of  clusters 
is  reduced  by  one  and  this  process  is  terminated 
when  the  desired  number,  C,  of  clusters  is  ob- 
tained. If  the  procedure  is  such  that  whenever 
any  two  samples  are  in  the  same  cluster  at  a gi- 
ven level,  they  also  are  in  the  same  cluster  at 
any  higher  level,  then  the  clustering  procedure 
is  called  hierarchical.  This  type  of  clustering 
is  widely  used  in  biological  toxonomy  and  other 
classiflcatory  sciences. 

There  are  two  basic  types  of  hierarchical 
clustering  algorithms:  agglomerative  and  divisi- 
ve. The  method  described  above  in  which  two  clu- 
sters are  merged  at  each  level  is  called  agglo- 
merative. Divisive  clustering  occurs  when  at 
each  level  one  cluster  is  split  into  two.  Regard- 
less of  the  type  of  hierarchical  scheme  used,  the 
way  one  defines  the  distance  between  two  clus- 
ters is  critical  because  in  agglomerative  s 'heme 
the  two  nearest  clusters  are  joined  at  each 
stage  and  in  divisive  schemes  a single  cluster 
is  split  to  yield  two  clusters  which  are  the 
farthest  apart. 

Some  of  the  more  widely  used  distance  for- 
mulas are  : 


■^min  ^^1'  Xj)  = f 

Xe  Xi>  X'e  Xj 

(3-1) 

‘^max  Xj)  = d(X  X') 

Xe  x^.  X’e  Xj 

(3-2) 

(Xj  f X,)  “ d(m,  m' ) 
mean  ^1  ^j 

(3-3) 

of  m'  = mean  of  and 

Xj  are 

where 

m = mean  of  m'  = mean  of  and  are 

clusters,  d(.,.)  represents  some  distance  measu- 
re such  as  Euclidean  distance. 

Consider  the  use  of  d , as  the  measure  of 
min 

distance  between  clusters  in  an  agglomerative 
scheme.  We  may  think  of  the  data  points  as  nodes 


in  some  graph.  For  a review  of  graph  theo- 
ry see  Ore  [9).  When  d^j^^  la  uaad,  nearest 
neighbors  determine  the  nearest  subsets  or  clus- 
ters. In  the  terminology  of  graph  theory,  the 
merger  of  and  X,  corresponds  to  adding  an 
edge  between  the  '^nearest  pair  of  nodes,  one  in 
Xj^  and  one  in  X.>  Since  such  edges  linking  clus- 
ters always  go  TOtween  distinct  clusters,  it  is 
obvious  that  the  resultant  graph  never  contains 
any  closed  loops  or  circuits.  Such  a graph  is 
referred  to  as  a tree.  If  this  edge  linking  pro- 
cedure is  continued  until  there  is  a path  from 
any  node  to  any  other  node,  the  resulting  graph 
is  said  to  be  a spanning  tree.  In  addition,  it 
can  be  shown  that  the  sum  of  the  edge  lengths  of 
this  spanning  tree  will  never  be  greater  than 
the  sum  of  tne  edge  lengths  of  any  other  tree 
which  spans  Che  nodes.  Thus  this  graph  is  said 
to  be  the  minimal  spanning  tree  (MST)  for  the 
data  set  and  this  procedure  is  called  the  near- 
est-neighbor or  single-linkage  clustering  al- 
gorithm. 

When  Che  d measure  of  distance  is  used 
in  an  agglomeraTlve  hierarchical  clustering 
scheme,  we  obtain  what  is  referred  to  as  the 
furthest-neighbor  algorithm.  Applying  this  algo- 
rithm to  a set  of  data  can  also  be  described  in 
terms  of  graph  theory.  At  each  stage  of  the  hie- 
rarchy we  produce  a graph  in  which  edges  connect 
all  the  pairs  of  nodes  of  a given  cluster.  In 
graph  theoretic  terminology,  each  cluster  is 
said  to  form  a complete  subgraph.  Referring  to 

the  definition  of  d , we  deduce  that  the  dis- 
max 

tance  between  two  clusters  is  determined  by  the 
most  distant  pair  of  nodes  belonging  to  their 
union.  This  quantity  is  also  referred  to  as  the 
diameter  of  the  two  clusters'  union. 

As  was  mentioned  previously  in  connection 
with  the  single- linkage  algorithm,  the  concept 
of  the  Minimal  spanning  tree  (MST)  is  an  impor- 
tant one  and  was  introduced  by  Prim  in  [10].  It 
is  a deceptively  simple  structure  which  has  a- 
mazing  properties  for  cluster  analysis.  Zahn 
[11]  was  the  first  to  demonstrate  its  powers  in 
dealing  with  a number  of  problems  which  were 
rendered  unsolvable  by  other  methods.  He  showed 
that  clusters  in  a two-dimensional  space  which 
the  eye  identified  immediately  as  separate  en- 
tities could  be  separated  trivially  by  a cluste- 
ring algorithm  based  on  the  MST.  In  fact,  he 
went  so  far  as  to  suggest  that  the  MST  was  the 
fundamental  mechanism  responsible  for  proximity 
and  Gestalt  effects  in  psychology. 

Some  other  approaches  to  cluster  analysis 
using  tree  structures  are  described  and  referred 
to  in  [12J  . 

The  main  properties  of  the  MST  can  be  sum- 
marized as  follows : 

1.  Any  point  (node)  is  connected  to  at  least 
one  of  its  nearest  neighbors. 

2.  Any  subtree  is  connected  to  at  least  one  of 
its  nearest  neighbors  by  the  shortest  available 
pa  th . 

3.  The  MST  minimizes  all  increasing  symmetric 
functions  of  interpoint  distance. 


4.  The  MST  connectivity  is  invariant  under  any 
transformation  which  preserves  the  rank  order  of 
interpoint  distance. 

5.  The  MST  structure  is  easy  to  compute  and 
resembles  a loopless  skeleton  of  the  configura- 
tion. 

Thus  we  have  found  two  attributes,  the  MST 
and  the  cluster  diameter,  which  in  part  tell 
something  of  the  structure  present  in  a data  set. 
In  the  following  action,  these  two  concepts  are 
combined  with  that  of  interpoint  distance  to 
yield  a procedure  for  nonlinear  feature  extrac- 
tion. 


4.  Feature  Extraction  Procedure 


For  convenience,  we  shall  now  restate  the 
feature  extraction  problem: 

Let  there  be  given  a set  of  points 
X = {Xj^,  X2,...Xj^}  where  each  X^^eR".  We  wish  to 

find  a corresponding  set  of  points  . . .Yj^) 

where  each  Y^^e  R™  (1<  m<  n)  in  such  a manner 
that  the  structure  contained  in  the  data  set  X 
isoptimally  preserved  under  the  transformation 
g : r"  — > r”  which  maps  X into  Y.  We  take  the 
liberty  of  expressing  g(Xj^)  as  Y^  due  to  the  one- 

to-one  correspondence  between  the  sets  X and  Y. 

For  N points,  there  are  N(N-l)/2  independent 
interpoint  distances.  After  determining  these 
distances,  the  minimal  spanning  tree  (MST)  for 
the  data  set  can  be  constructed.  Once  this  is 
accomplished,  a method  of  feature  extraction 
based  on  a divisive  hierarchical  clustering 
scheme  can  be  implemented  in  the  following  fa- 
shion. 

Let  us  define  the  inconsistency  measure  for 
an  edge  of  the  MST  in  the  following  manner.  Sup- 
pose that  there  is  an  edge  of  the  MST  connecting 
nodes  represented  by  the  points  Xj^  and  X,.  We 
define  the  inconsistency  measure  [lljof'^  this 
edge  as  the  ratio  of  length  of  the  edge  between 
Xj^  and  Xj  divided  by  the  average  length  of  all 

MST  edges  connected  to  either  X^  or  X.  (exclu- 
ding the  edge  from  Xj^  to  X^).  Since  foira  data 

set  of  size  N,  the  MST  has  N-1  edges,  we  can 
store  the  inconsistency  measure  for  each  edge  of 
the  MST  in  an  array,  E,  of  dimension  N-1. 

Since  we  may  think  of  clustering  as  merely 
a process  of  labeling  points  according  to  their 
cluster  or  group  membership,  we  can  store  such 
membership  in  an  array  C of  dimension  N,  where 
contains  the  label  or  membership  of  point 
being  placed  in  cluster  X,.  So  C.  " 1 for  all  i 
initially. 

To  initiate  the  feature  extraction  procedu- 
re, the  array  E is  searched  to  find  the  MST  edge 
with  the  largest  value  for  inconsistency.  We  i- 
dentify  the  endpoints  of  this  edge  and  will  de- 
note them  as  X^  and  X^.  The  value  for  the  diame- 
ter of  the  cluster  Xj^  is  also  determined.  Let 

the  endpoints  of  this  diameter  edge  be  denoted 

by  X^  and  x'’  and  the  diameter  be  D^-.  Once  these 
■'  m n 


four  points  are  known,  we  group  them  in  what  we 

call  the  active  set.  A^.  So  A^  = {X.l.  xK  X^.  X^1 
k t m n 

This  set  is  termed  the  active  set  because  it 

contains  the  vectors  whose  images  Y^,  y]-,  Y^, 

K <0  n n 

are  optimally  located  in  the  image  or  feature 
space  at  the  first  stage  of  the  process. 

Thus  once  the  membership  of  the  active  set 
is  found,  the  image  set  can  be  determined.  This 
is  done  by  finding  the  set  of  points 


''i' 


which  minimize  the  following  criterion: 


Q^(g(4^)) 


■ I , z. 

ie  jel 


(d 


ii. 


jel* 

i<j 

under  the  constraint  that 
d*ij  < V i,  j e 

where 


1 


( 1: 


X^e  A^), 


“Ij 

d* 


d(X 


ij 


i’ 

d(Y 


Xj), 


i>  ^J>- 


(4-1) 


(4-2) 

(4-3) 

(4-4) 

(4-5) 


The  above  is  the  standard  Sammon  criterion 
except  that  we  have  added  the  inequality  con- 
straint (4-2)  and  we  do  not  require  the  distan- 
ces (4-4)  and  (4-5)  to  be  Euclidian.  Note  also 
that  we  are  applying  this  criterion  to  only  four 
points  rather  than  as  in  [2]  to  the  entire  data 
set. 

The  optimization  of  (4.1),  (4.2)  may  be 
carried  out  by  the  iterative  method  found  in  the 
Appendix  of  this  paper.  ^ 

Once  the  optimal  configuration  of  g (4  ) 
is  found,  the  location  of  the  points  correspon- 
ding to  Y,*’,  yK  Y , and  Y^  are  fixed  and  are  not 
k •0  m n 

allowed  to  vary  through  the  remaining  part  of 
the  procedure.  As  an  indication  of  this  fact,  the 
vector  B of  dimension  N is  constructed.  If  the 
component  B.=l,  this  indicates  that  the  image  of 
Xj^  has  been  located.  So  initially  Bj^  - 0 

(i=l,...N),  but  after  the  first  stage  B.=l  for 

1 ^ 

Xj^e  A . Before  we  begin  the  next  stage  of  the  al- 
gorithm, we  must  update  the  cluster  membership 
on  the  basis  of  deleting  the  most  inconsistent 
edge  found  earlier  and  creating  two  connected 
subgraphs  with  the  remaining  edges  of  the  MST. At 
this  step,  some  of  the  components  of  C will  be 
changed  to  reflect  changes  in  cluster  membership. 

In  general,  we  repeat  the  above  procedure 
until  all  the  points  of  X have  their  images  map- 
ped into  Y.  Let  us  assume  that  we  are  at  the 
stage  of  the  algorithm.  A search  of  E is  conduc- 
ted to  find  the  most  Inconsistent  edge  of 
the  MST.  The  endpoints  of  this  edge,  x|^  and 

as  well  as  the  endpoints  of  the  diameter  edge, 

x'^  and  X*^,  for  the  cluster  which  contains  the 
m n 

most  inconsistent  edge  are  determined.  These 
points  are  then  placed  in  the  active  set  A^. 
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If  any  points  belonging  to  have  already  been 
mapped  Into  Images  In  tf”.  It  Is  understood  that 
they  will  not  be  moved  at  this  stage  of  the  al- 
gorithm. Suppose  that  the  cluster  containing  the 
most  Inconsistent  edge  is  designated  cluster 
X and  that  the  diameter  of  Xp  is  if  A (iij) 
ll  the  standard  Kronecker  delu  function,  the 


X and  that  the  diameter  of  Xp  is  if 
ll  the  standard  Kronecker  delta  function, 
criterion  functional  fotj^the  K*-*'  stage  is 


A(p-Cj) 

(4-6) 


under  the  constraint  that 


d*^j  <ir  V 1,  j 3 and  Xj  e Xp,  (4-7) 


I^  = { 1:  X^  e A^.  (4-8) 

The  above  criterion  is  minimized  by  the  me- 
thod outlined  in  the  Appendix. 

This  constraint  is  added  to  insure  that  no 
two  points  in  the  image  set  of  Xp  farther  a- 
part  than  the  diameter  of  Xp.  Examination  of  the 
criteria  shows  that  the  configuration  g(A*^)  = 

[yJ^,  Y^,  Y^)  is  optimized  with  respect  to 

intracluster  interpoint  distances  with  added  em- 
phasis on  the  diameter  edge  length  and  the  in- 
consistent edge  length.  After  the  optimization 
is  performed,  the  cluster  membership  is  updated 
by  setting  Ci  = K for  all  values  of  i represen- 
ting points  belonging  to  one  of  the  two  connec- 
ted components  of  cluster  Xp  formed  by  deletion 
of  the  most  inconsistent  edge. 

This  procedure  is  repeated  until  all  the  N 
points  are  mapped  to  R . For  a flow  chart  of 
this  entire  operation  refer  to  Fig.  1. 

5.  Determination  of  the  dimensionality  of  the 
feature  space 

The  final  feature  extraction  obtained  de- 
pends upon  the  value,  m,  of  the  dimensionality 
of  the  feature  space.  The  problem  remains  as  to 
how  to  make  a reasonable  choice  for  m. 

In[13],  Schwartzmann  and  Vidal  present  an 
algorithm  for  estimating  the  topological  or  in- 
trinsic dimensionality  of  point  sets.  Since  we 
are  concerned  in  our  algorithm  with  preserving 
the  structure  in  a data  set,  it  seems  only  natu- 
ral that  we  concern  ourselves  with  determining 
the  topological  dimensionality.  The  algorithm  of 
Schwartzmann  and  Vidal  relies  heavily  on  the  MST 
Just  as  the  hierarchical  clustering  presented 
earlier.  This  approach  is  iterative  and  is  also 
based  on  Karhunen  — Lod ve  theory  along  with  the 
theory  of  barycentrlc  transformations.  As  a re- 
sult, before  beginning  the  feature  extraction 
process,  we  perform  the  Schwartzmann-Vidal  algo- 
rithm to  get  an  estimate  of  the  topological  di- 
mensionality to  be  used  as  the  value  for  m. 

Recall  that  in  the  discussion  of  the  MST, 
it  was  stated  that  any  transformation  of  a dtta 
set  which  preserves  the  rank  ordering  of  the  in- 


terpoint distances  also  preserves  the  MST  con- 
nectivity. This  is  intimately  connected  with  the 
work  of  Shepherd  [14]  who  was  concerned  with  pre- 
serving monotonic  relationships.  Since  the 
MST  has  been  shown  to  have  important  clustering 
and  Gestalt  properties  [11]  it  would  be  desirable 
to  preserve  its  connectivity  under  transformati- 
on. To  do  so  it  would  be  required  that  the  image 
set  of  Y be  such  that  for  each  inequality 
d(Xj^,  Xj)  <d  (X^,  X^)  we  would  have  to  have 

d(Yj^,  Yj)  <d  (Yj^,  Y^).  With  N(N-l)/2  interpoint 

distances,  this  translates  to 

(N(-l)/2)  • (N(N-l)/2  - l)/2 
pairwise  inequality  constraints.  In  most  cases, 
such  a large  number  of  constraints  would  greatly 
deter  optimization. 

With  this  in  mind,  an  NxN  array  R is  con- 
structed so  that  R*j  gives  the  rank  order  of  the 

distance  between  Y,  and  Y . We  can  then  use  the 

^ i J 

value  for  IjR-R  ||  as  a means  for  determining  the 
"goodness"  of  the  transformation.  If  the  value 
of  Hr-R*||  is  too  large,  the  feature  extraction 
would  be  repeated  only  with  the  value  for  the 
dimensionality  of  the  feature  space  increased. 
Increasing  the  value  of  m would  allow  for  a 
greater  number  for  the  degrees  of  freedom  and 
would  thus  enable  one  more  freedom  in  preserving 
the  rank  order  of  interpoint  distance. 

Conclusions 

An  algorithm  for  nonlinear  feature  extrac- 
tion has  been  presented.  It  emphasizes  structures 
from  graph  theory  such  as  the  minimal  spanning 
tree,  maximally  complete  subgraphs,  inconsistent 
edges,  and  diameter  edges  as  important  entities 
to  be  preserved  under  transformation.  The  proce- 
dure is  hierarchical  in  nature.  By  solving  a 
number  of  small  problems  instead  of  one  large 
problem,  this  procedure  greatly  reduces  the  com- 
plexity of  computer  implementation.  This  approach 
shows  great  promise  especially  when  one  is  not 
sure  of  class  conditional  statistics  for  it  is 
based  on  other  structures  present  in  the  data. 
Presently,  computer  programs  are  being  developed 
to  implement  this  algorithm  at  Rice  University. 
Numerical  results  on  the  application  of  this  al- 
gorithm to  fluorescence  and  infrared  spectrosco- 
py data  for  oil  spill  identification  will  appear 
shortly . 

APPENDIX 

At  each  stage  of  the  feature  extraction  al- 
gorithm, we  are  concerned  with  finding  the  confi- 
- K -V 

guration  g(A  ) « Y which  optimizes 


Q^(g(A'')) 


N 

r I 

iel*^  j-i 


- d*  )2 


- (1:X^  e A*^} 
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under  the  constraint  that  d*j  < D for  all 

i.j  : Xj  e Xp.  (A-3) 

This  sort  of  problem  Is  well  suited  for 
solution  by  an  algorithm  proposed  by  Flacco  and 
McCormick  [15].  Let  us  denote  (g(A*^))  by 

F(Yi,  Y2,  Yj,  Y^).  Suppose  that  there  are  M 
pairs  of  points  (Xj^,  Xj)  such  that  both  are  In 
Xj.  We  can  express  the  Inequality  constraints  as 

^^1’  ^ ° k=l,...M.  (\-A) 

The  procedure  developed  by  Flacco  and 
McCormick  can  be  applied  to  this  problem  as 
follows : 

1.  A modified  objective  function  Is  formulated 
consisting  of  the  original  function  to  be  mini- 
mized and  penalty  functions  with  the  form: 

M 

P = F - r r -t-n  G,  , (A-5) 

k=l  ^ 

where  r Is  a positive  constant.  As  the  algorithm 
proceeds  r Is  evaluated  to  form  a montonlcally 

decreasing  sequence,  rj^  > r2  >. . .>  0.  As  r grows 

small  under  suitable  conditions  P approaches  F 
and  the  problem  Is  solved. 

2.  Selects  starting  point  (feasible  or  nonfea- 
slble)  and  an  Initial  value  for  r. 

3.  Determine  the  minimum  of  the  objective  func- 
tion for  the  current  value  of  r by  an  uncon- 
strained gradient  technique. 

4.  Estimate  the  optimal  solution  using  extra- 
polation formulas. 

5.  Select  a new  value  for  r and  repeat  the  pro- 
cedure until  convergence  criteria  are  satisfied. 
A logic  diagram  for  this  process  is  given  In 
Fig.  2. 
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Fig.  1.  Flowchart  for  Feature  Extraction  Pro- 
cedure 
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